N-Gram Language Model Compression Using Scalar Quantization and Incremental Coding

نویسندگان

  • Shuo DI
  • Lei ZHANG
  • Zheng CHEN
  • Eric CHANG
  • Kai-Fu LEE
چکیده

This paper describes a novel approach of compressing large trigram language models, which uses scalar quantization to compress log probabilities and back-off coefficients, and incremental coding to compress entry pointers. Experiments show that the new approach achieves roughly 2.5 times of compression ratio compared to the well-known tree-bucket format while keeps the perplexity and accessing speed almost unchanged. The high compression ratio enables our method to be used in various SLM-based applications such as Pinyin input method and dictation on handheld devices with little available memory.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coding Algorithm Based on Loss Compressing using Scalar Quantization Switching Technique and Logarithmic Companding

This paper proposes a novel coding algorithm based on loss compression using scalar quantization switching technique. The algorithm of switching is performed by the estimating input variance and further coding with Nonuniform Switched Scalar Compandor (NSSC). An accurate estimation of the input signal variance is needed when finding the best compressor function for a compandor implementation. I...

متن کامل

Quantization techniques pdf

This paper proposes some fast and simple quantization techniques to display.The main reason for adopting different techniques in vector quantizers VQ is to design an optimal quantizer. Since the actual probability distributions of image.Abstract: Image Compression is a technique for competently coding digital. Vector Quantization VQ is a block-coding technique that quantizes blocks of data.Defi...

متن کامل

Color Video Compression Based on Chrominance Vector Quantization

This paper proposes a compression technique to improve the quality of color in very low bit rate coding of video. The general idea is to convert the two chrominance components to one scalar chrominance which is processed further. The scalar representation of chrominance is obtained through vector quantization in the chrominance plane. Each (CB ; CR) vector is represented by a scalar index to a ...

متن کامل

Wavelet Transform Coding With Linear Prediction And The Optimal Choice Of Wavelet Basis

Wavelet transform based coding has shown to be a promising method in low bit rate data compression. By using its multiresolution characteristics and the dependencies among subbands, the important visual features can be reconstructed at high compression ratio. In this paper, we propose a new wavelet transform coding scheme which exploits the linear prediction model for the existing dependencies ...

متن کامل

A Succinct N-gram Language Model

Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of N gram language models based on LOUDS, a succinct data structure. LOUDS succinctly represents a trie with M nodes as a 2M + 1 bit string. We compress it further for the N -gram language model structure. We also use ‘variable length coding’ and ‘block-wise compression’ to comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000